-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Add a head merging transformation alongside the tail merging transformation #90468
Conversation
Add a pass that does head merging to compliment the existing tail merging pass. Unlike tail merging this requires reordering the first statement with the terminator node of the predecessor, which requires some interference checking. Fix dotnet#90017
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsAdd a pass that does head merging to compliment the existing tail merging pass. Unlike tail merging this requires reordering the first statement with the terminator node of the predecessor, which requires some interference checking. Fix #90017
|
cc @dotnet/jit-contrib PTAL @AndyAyersMS. Thoughts on this? Better to unify them more? Split them more apart? Diffs. Quite skewed because of one particular test, but still a fair amount of improvements, in a lot of cases creating new conditional selects (which both bring size-wise improvements and regressions). The reordering also can bring regressions with it, in the same way that we see from forward sub sometimes. There's also a consideration that tail merging and head merging can create new opportunities for each other, but this change doesn't try to rerun either of them when that potentially happens. |
/azp run Fuzzlyn, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 3 pipeline(s). |
src/coreclr/jit/fgopt.cpp
Outdated
{ | ||
if (gtHasRef(tree1, dsc->lvFieldLclStart + i)) | ||
{ | ||
JITDUMP(" cannot reorder with interferring use of struct field\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JITDUMP(" cannot reorder with interferring use of struct field\n"); | |
JITDUMP(" cannot reorder with interfering use of struct field\n"); |
similar elsehwere
Seems like a pretty natural fit. I always worry that interference checking (particularly when repeated) can be costly. Current TP cost does not look too bad, but suggests a lot of failed checks (since if we can head merge it likely improves TP). So I wonder if there are ways to fail faster? Do you know how often we end up moving multiple statements? |
The Fuzzlyn failures are #7539 (hitting this a lot after adding
For libraries.pmi the histogram of how many statements we head merge in each block (where we head merge any) looks like: 1 .. 1 ===> 4173 count ( 56% of total)
2 .. 2 ===> 972 count ( 70% of total)
3 .. 3 ===> 919 count ( 82% of total)
4 .. 4 ===> 366 count ( 87% of total)
5 .. 5 ===> 487 count ( 94% of total)
6 .. 10 ===> 360 count ( 99% of total)
11 .. 15 ===> 40 count ( 99% of total)
16 .. 20 ===> 5 count (100% of total) I can try to see if I can improve TP a bit -- for one we shouldn't need to iterate over all BBs twice if we just call both head and tail merging from one loop. |
The detailed TP breakdown is, after moving the head merging into Base: 115595112161, Diff: 115766296686, +0.1481%
?fgHeadMerge@Compiler@@QEAA_NPEAUBasicBlock@@AEAV?$ArrayStack@UPredSuccInfo@@@@_N@Z : 159867209 : NA : 20.48% : +0.1383%
`Compiler::fgHeadTailMerge'::`2'::<lambda_1>::operator() : 128143167 : NA : 16.42% : +0.1109%
?NumSucc@BasicBlock@@QEAAIPEAVCompiler@@@Z : 55787892 : +32.82% : 7.15% : +0.0483%
?fgHeadTailMerge@Compiler@@QEAA?AW4PhaseStatus@@_N@Z : 53696551 : NA : 6.88% : +0.0465%
?GetSucc@BasicBlock@@QEAAPEAU1@IPEAVCompiler@@@Z : 30154807 : +13.41% : 3.86% : +0.0261%
?Compare@GenTree@@SA_NPEAU1@0_N@Z : 15846220 : +21.62% : 2.03% : +0.0137%
`Compiler::compCompile'::`2'::<lambda_3>::operator() : 12277667 : NA : 1.57% : +0.0106%
??$Emplace@AEAUPredSuccInfo@@@?$ArrayStack@UPredSuccInfo@@@@QEAAXAEAUPredSuccInfo@@@Z : 5529537 : NA : 0.71% : +0.0048%
?compCompile@Compiler@@IEAAXPEAPEAXPEAIPEAVJitFlags@@@Z : 1684987 : +1.37% : 0.22% : +0.0015%
GenTreeVisitor<`Compiler::gtHasLocalsWithAddrOp'::`2'::LocalsWithAddrOpVisitor>::WalkTree : 1252917 : +1.84% : 0.16% : +0.0011%
?fgUpdateFlowGraph@Compiler@@QEAA_N_N0@Z : 1130065 : +0.08% : 0.14% : +0.0010%
BasicBlock::VisitAllSuccs<`Compiler::optReachable'::`12'::<lambda_1> > : 879816 : +0.47% : 0.11% : +0.0008%
?fgReorderBlocks@Compiler@@QEAA_N_N@Z : -790057 : -0.21% : 0.10% : -0.0007%
?newRefPosition@LinearScan@@AEAAPEAVRefPosition@@PEAVInterval@@IW4RefType@@PEAUGenTree@@_KI@Z : -801471 : -0.09% : 0.10% : -0.0007%
?RenameDef@SsaBuilder@@AEAAXPEAUGenTree@@PEAUBasicBlock@@@Z : -812236 : -0.86% : 0.10% : -0.0007%
?incRefCnts@LclVarDsc@@QEAAXNPEAVCompiler@@W4RefCountState@@_N@Z : -814351 : -0.27% : 0.10% : -0.0007%
?OptimizeRangeCheck@RangeCheck@@QEAAXPEAUBasicBlock@@PEAUStatement@@PEAUGenTree@@@Z : -825515 : -0.37% : 0.11% : -0.0007%
?genCodeForTreeNode@CodeGen@@IEAAXPEAUGenTree@@@Z : -825588 : -0.12% : 0.11% : -0.0007%
?MakeCopy@?$BitSetOps@PEA_K$00PEAVCompiler@@VTrackedVarBitSetTraits@@@@SAPEA_KPEAVCompiler@@PEA_K@Z : -835566 : -0.32% : 0.11% : -0.0007%
?fgValueNumberBlock@Compiler@@QEAAXPEAUBasicBlock@@@Z : -846018 : -0.31% : 0.11% : -0.0007%
?AddPhiArg@SsaBuilder@@AEAAXPEAUBasicBlock@@PEAUStatement@@PEAUGenTreePhi@@II0@Z : -862799 : -1.49% : 0.11% : -0.0007%
?getWeight@LinearScan@@AEAANPEAVRefPosition@@@Z : -866614 : -0.29% : 0.11% : -0.0007%
?UnionD@?$BitSetOps@PEA_K$00PEAVCompiler@@VTrackedVarBitSetTraits@@@@SAXPEAVCompiler@@AEAPEA_KPEA_K@Z : -871931 : -0.20% : 0.11% : -0.0008%
?fgMorphTree@Compiler@@QEAAPEAUGenTree@@PEAU2@PEAUMorphAddrContext@1@@Z : -883522 : -0.10% : 0.11% : -0.0008%
?RenamePushDef@SsaBuilder@@AEAAIPEAUGenTree@@PEAUBasicBlock@@I_N@Z : -886870 : -1.13% : 0.11% : -0.0008%
?EnsureCoversInd@?$JitExpandArray@PEAUChunk@ValueNumStore@@@@IEAAXI@Z : -892604 : -0.31% : 0.11% : -0.0008%
?gtSetEvalOrder@Compiler@@QEAAIPEAUGenTree@@@Z : -905597 : -0.05% : 0.12% : -0.0008%
??$resolveRegisters@$00@LinearScan@@QEAAXXZ : -934671 : -0.15% : 0.12% : -0.0008%
?fgInterBlockLocalVarLiveness@Compiler@@QEAAXXZ : -998691 : -0.17% : 0.13% : -0.0009%
?optVnCopyProp@Compiler@@QEAA?AW4PhaseStatus@@XZ : -1013931 : -0.51% : 0.13% : -0.0009%
?RewriteNode@Rationalizer@@AEAA?AW4fgWalkResult@Compiler@@PEAPEAUGenTree@@AEAV?$ArrayStack@PEAUGenTree@@@@@Z : -1016909 : -0.13% : 0.13% : -0.0009%
?fgValueNumberStore@Compiler@@QEAAXPEAUGenTree@@@Z : -1027387 : -0.72% : 0.13% : -0.0009%
?fgComputeLifeLIR@Compiler@@QEAAXAEAPEA_KPEAUBasicBlock@@AEBQEA_K@Z : -1032432 : -0.25% : 0.13% : -0.0009%
?BuildNode@LinearScan@@AEAAHPEAUGenTree@@@Z : -1036010 : -0.15% : 0.13% : -0.0009%
?fgPerBlockLocalVarLiveness@Compiler@@QEAAXXZ : -1052193 : -0.16% : 0.13% : -0.0009%
?InsertPhiFunctions@SsaBuilder@@AEAAXPEAPEAUBasicBlock@@H@Z : -1106749 : -0.58% : 0.14% : -0.0010%
?optBlockCopyProp@Compiler@@QEAA_NPEAUBasicBlock@@PEAV?$JitHashTable@IU?$JitSmallPrimitiveKeyFuncs@I@@PEAV?$ArrayStack@VCopyPropSsaDef@Compiler@@@@VCompAllocator@@VJitHashTableBehavior@@@@@Z : -1144389 : -0.32% : 0.15% : -0.0010%
?fgValueNumberTree@Compiler@@QEAAXPEAUGenTree@@@Z : -1149567 : -0.17% : 0.15% : -0.0010%
`Compiler::optCopyPropPushDef'::`2'::<lambda_1>::operator() : -1166866 : -1.11% : 0.15% : -0.0010%
?optAssertionProp@Compiler@@QEAAPEAUGenTree@@AEBQEA_KPEAU2@PEAUStatement@@PEAUBasicBlock@@@Z : -1194100 : -0.26% : 0.15% : -0.0010%
?fgMorphSmpOp@Compiler@@AEAAPEAUGenTree@@PEAU2@PEAUMorphAddrContext@1@PEA_N@Z : -1206778 : -0.07% : 0.15% : -0.0010%
?fgComputeLifeTrackedLocalDef@Compiler@@QEAA_NAEAPEA_KAEBQEA_KAEAVLclVarDsc@@PEAUGenTreeLclVarCommon@@@Z : -1229223 : -0.67% : 0.16% : -0.0011%
?optAssertionPropMain@Compiler@@QEAA?AW4PhaseStatus@@XZ : -1373368 : -0.34% : 0.18% : -0.0012%
?PerBlockAnalysis@LiveVarAnalysis@@AEAA_NPEAUBasicBlock@@_N1@Z : -1390878 : -0.10% : 0.18% : -0.0012%
??$buildIntervals@$00@LinearScan@@QEAAXXZ : -1420236 : -0.28% : 0.18% : -0.0012%
?doLinearScan@LinearScan@@UEAA?AW4PhaseStatus@@XZ : -1472138 : -0.40% : 0.19% : -0.0013%
?WalkTree@?$GenTreeVisitor@V?$GenericTreeWalker@$0A@$00$0A@$00@@@@QEAA?AW4fgWalkResult@Compiler@@PEAPEAUGenTree@@PEAU4@@Z : -1480816 : -0.34% : 0.19% : -0.0013%
?UpdateLifeVar@?$TreeLifeUpdater@$00@@AEAAXPEAUGenTree@@PEAUGenTreeLclVarCommon@@@Z : -1492533 : -0.53% : 0.19% : -0.0013%
?resolveLocalRef@LinearScan@@AEAAXPEAUBasicBlock@@PEAUGenTreeLclVar@@PEAVRefPosition@@@Z : -1708209 : -0.47% : 0.22% : -0.0015%
?optVNAssertionPropCurStmtVisitor@Compiler@@KA?AW4fgWalkResult@1@PEAPEAUGenTree@@PEAUfgWalkData@1@@Z : -1787018 : -0.34% : 0.23% : -0.0015%
?genCodeForBBlist@CodeGen@@IEAAXXZ : -1863492 : -0.20% : 0.24% : -0.0016%
?optAssertionGen@Compiler@@QEAAXPEAUGenTree@@@Z : -1953068 : -0.28% : 0.25% : -0.0017%
?lvaComputeRefCounts@Compiler@@QEAAX_N0@Z : -1990712 : -0.26% : 0.26% : -0.0017%
jitstd::`anonymous namespace'::quick_sort<unsigned int *,LclVarDsc_BlendedCode_Less> : -2055001 : -0.47% : 0.26% : -0.0018%
?allocateMemory@ArenaAllocator@@QEAAPEAX_K@Z : -2058385 : -0.08% : 0.26% : -0.0018%
?fgMarkUseDef@Compiler@@AEAAXPEAUGenTreeLclVarCommon@@@Z : -2423637 : -0.39% : 0.31% : -0.0021%
?allocateRegisters@LinearScan@@QEAAXXZ : -2501343 : -0.08% : 0.32% : -0.0022%
??$select@$0A@@RegisterSelection@LinearScan@@QEAA_KPEAVInterval@@PEAVRefPosition@@@Z : -3296353 : -0.12% : 0.42% : -0.0029%
?DoPhase@@YAXPEAVCompiler@@W4Phases@@P81@EAA?AW4PhaseStatus@@XZ@Z : -3654638 : -1.32% : 0.47% : -0.0032%
?processBlockStartLocations@LinearScan@@AEAAXPEAUBasicBlock@@@Z : -3668164 : -0.25% : 0.47% : -0.0032%
ArrayStack<`Compiler::fgTailMerge'::`2'::PredInfo>::Emplace<`Compiler::fgTailMerge'::`2'::PredInfo &> : -5529385 : -100.00% : 0.71% : -0.0048%
?optCopyProp@Compiler@@QEAA_NPEAUBasicBlock@@PEAUStatement@@PEAUGenTreeLclVarCommon@@IPEAV?$JitHashTable@IU?$JitSmallPrimitiveKeyFuncs@I@@PEAV?$ArrayStack@VCopyPropSsaDef@Compiler@@@@VCompAllocator@@VJitHashTableBehavior@@@@@Z : -10813657 : -0.93% : 1.39% : -0.0094%
`Compiler::compCompile'::`2'::<lambda_2>::operator() : -12285906 : -100.00% : 1.57% : -0.0106%
?fgTailMerge@Compiler@@QEAA?AW4PhaseStatus@@XZ : -28273557 : -100.00% : 3.62% : -0.0245%
`Compiler::fgTailMerge'::`2'::<lambda_1>::operator() : -128142453 : -100.00% : 16.42% : -0.1109% (Note that
I collected stats on the checks, and over libraries.pmi |
What happens if you just restrict head merge to |
It over halves the TP impact with only a couple non-test regressions -- I will switch to that. I wonder how much it is about visiting switches compared to the |
Failures are known according to build analysis. |
Add a pass that does head merging to complement the existing tail merging pass. Unlike tail merging this either requires spilling or reordering the first statement with the terminator node of the predecessor, which requires some interference checking.
Fix #90017